32-Bit Mode

Boreal

Let's use some of that 32-bit power that the '386 and newer processors provide. All we need to do is stick an "e" in front of register names and tell the assembler to shift gears from 8088 into '386 mode.

Here's how the DECOUT program can be extended to handle much larger numbers:

                page    240, 132        ;                       <-- New
        ;DECOUT32.ASM
        ;Routine to display a signed, decimal number in the eax register

                .386                    ;                       <-- New
        cseg    segment use16           ;                       <-- New
                assume  cs:cseg, ds:cseg, ss:cseg, es:cseg

                org     100h
        start:  mov     eax, 1234567890 ;                       <-- New
                call    bigDecout
                call    crlf

The "page" directive is really not necessary; it just makes the listing (.LST file) much neater by extending lines to 132 columns instead of wrapping them at 80. It also gets rid of some unnecessary page headers by setting the page length to 240 lines. (For some reason it's traditional that assemblers feel entitled to clutter the page with headers.)

The ".386" directive is what tells the assembler to shift from 8088 mode (the default) to '386 mode.

An important little change is the magic word "use16". Without this, the assembler generates code for '386 32-bit "protected" mode, but we need to stay in "real" mode.

Real mode is the mode that the PC was originally designed to run in, and it's the mode that PCs still boot up into today. Protected mode is the mode that exploits the power of the 32-bit processor. It's the mode that Windows 95 and Linux run in. Real mode is retained so that old programs still run on newer PCs. We need to stay in real mode because DOS is one of those old programs.

Use16 enables us to take advantage of 32-bit mode while staying in real mode. It does this by making the assembler generate a prefix byte for each 32-bit instruction. This byte shifts the processor into 32-bit mode for the one instruction it precedes. (This is sometimes called "unreal" mode.) If you look at DECOUT32.LST you'll see a "66|" prefix byte everywhere an extended 32-bit register is used.

One other little change to DECOUT is the use of "xor edx, edx" in place of "mov edx, 0". It seems that assembly language programmers can't resist little optimizations here and there. Both instructions accomplish the same thing, but the XOR requires only three bytes while the MOV would have taken twice as many. (The PC instruction set isn't very efficient for loading small constants into registers.)

RDTSC

Now let's do something that (those snooty) high level languages can't do (at least not without resorting to using assembly instructions).

The Pentium added the RDTSC instruction. This ReaDs the Time Stamp Counter, which is a counter that increments on each CPU clock cycle. We can use this to see exactly how many cycles a section of code takes to execute. Dividing this by the CPU clock speed in megahertz gives the time in microseconds. Since assembly code is often used to speed up time-critical sections of programs written in high-level languages, this is a valuable tool.

Here's how to measure the speed of a loop that's executed 100 million times. The '386 doesn't know about the RDTSC instruction so we must use the .586 directive.

                .586                    ;Pentium processor
                rdtsc                   ;get initial cycle count
                push    eax             ;save it

                mov     ecx, 100000000  ;loop 100 million times
        sp10:   dec     ecx
                jne     sp10

                rdtsc                   ;get finial cycle count
                pop     ecx             ;get initial count
                sub     eax, ecx        ;calculate the difference
                call    bigDecout       ; and display it

The example program, RDTSC.ASM (in ASM101.ZIP), displays three numbers. The first shows the number of cycles of overhead the RDTSC instruction takes; the second shows the cycles taken by the loop; and the third shows the same loop without the overhead of interrupts (such as the 18 Hz system timer).

If you have trouble running RDTSC, try booting into pure DOS and bypass your startup files. RDTSC is a privileged instruction that may be incompatible with some memory managers such as HIMEM and EMM386.

It's interesting to see how various processors compare:

        Processor       cycles/loop
        Pentium 133         6
        Pentium 266         3
        Athlon 2400         2

Addressing

32-bit mode not only gives us bigger numbers to play with, but it also gives us a more uniform way to do indexed addressing. Almost any register can be used as an index register. Furthermore a "scale" multiplier of 1, 2, 4, or 8 can be used to automatically address bytes, words, double words, and floating point numbers. The general form for addressing is:

        seg_reg:[base+index*scale+offset]       (offset = displacement)

The third page of PCASM.TXT shows exactly which address modes are allowed.

Here's an example of how the scale factor can be used to index 4-byte values (double words):

        tbl2    dd            1000              ;thousand
                dd         1000000              ;million
                dd      1000000000              ;billion (U.S.)
        . . .

        ;Display the number indexed by esi (= 0, 1, or 2)
                mov     eax, [esi*4+tbl2]       ;fetch number
                call    bigDecout

You would expect 32-bit mode to be able to address more than 64K, but this is normally not the case. Unless the processor has been especially configured, attempting to access a memory location with an address offset (from a segment register) outside the range 0 through 65535 will cause a "general protection exception" (interrupt 13). Under Windows 98 you'll get an error message; under DOS your program might simply hang because DOS is too stupid to display the error message.

Boreal (aka: Loren Blaney)